62 research outputs found

    Optimization by gradient boosting

    Full text link
    Gradient boosting is a state-of-the-art prediction technique that sequentially produces a model in the form of linear combinations of simple predictors---typically decision trees---by solving an infinite-dimensional convex optimization problem. We provide in the present paper a thorough analysis of two widespread versions of gradient boosting, and introduce a general framework for studying these algorithms from the point of view of functional optimization. We prove their convergence as the number of iterations tends to infinity and highlight the importance of having a strongly convex risk functional to minimize. We also present a reasonable statistical context ensuring consistency properties of the boosting predictors as the sample size grows. In our approach, the optimization procedures are run forever (that is, without resorting to an early stopping strategy), and statistical regularization is basically achieved via an appropriate L2L^2 penalization of the loss and strong convexity arguments

    On symmetric sensitivty

    No full text
    International audienceWe define the concept of symmetric sensitivity with respect to initial conditions for the endomorphisms on Lebesgue metric spaces. The idea is that the orbits of almost every pair of nearby initial points (for the product of the invariant measure) of a symmetrically sensitive map may diverge from a positive quantity independent of the initial points. We study the relationships between symmetric sensitivity and weak mixing, symmetric sensitivity and positiveness of metric entropy and we compute the largest sensitivity constant

    Statistical analysis of kk-nearest neighbor collaborative recommendation

    Get PDF
    Collaborative recommendation is an information-filtering technique that attempts to present information items that are likely of interest to an Internet user. Traditionally, collaborative systems deal with situations with two types of variables, users and items. In its most common form, the problem is framed as trying to estimate ratings for items that have not yet been consumed by a user. Despite wide-ranging literature, little is known about the statistical properties of recommendation systems. In fact, no clear probabilistic model even exists which would allow us to precisely describe the mathematical forces driving collaborative filtering. To provide an initial contribution to this, we propose to set out a general sequential stochastic model for collaborative recommendation. We offer an in-depth analysis of the so-called cosine-type nearest neighbor collaborative method, which is one of the most widely used algorithms in collaborative filtering, and analyze its asymptotic performance as the number of users grows. We establish consistency of the procedure under mild assumptions on the model. Rates of convergence and examples are also provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOS759 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A nonparametric test for Cox processes

    Full text link
    In a functional setting, we propose two test statistics to highlight the Poisson nature of a Cox process when n copies of the process are available. Our approach involves a comparison of the empirical mean and the empirical variance of the functional data and can be seen as an extended version of a classical overdispersion test for counting data. The limiting distributions of our statistics are derived using a functional central limit theorem for c`adl`ag martingales. We also study the asymptotic power of our tests under some local alternatives. Our procedure is easily implementable and does not require any knowledge of covariates. A numerical study reveals the good performances of the method. We also present two applications of our tests to real data sets

    Dimension reduction in regression estimation with nearest neighbor

    No full text
    International audienceIn regression with a high-dimensional predictor vector, dimension reduction methods aim at replacing the predictor by a lower dimensional version without loss of information on the regression. In this context, the so-called central mean subspace is the key of dimension reduction. The last two decades have seen the emergence of many methods to estimate the central mean subspace. In this paper, we go one step further, and we study the performances of a kk-nearest neighbor type estimate of the regression function, based on an estimator of the central mean subspace. The estimate is first proved to be consistent. Improvement due to the dimension reduction step is then observed in term of its rate of convergence. All the results are distributions-free. As an application, we give an explicit rate of convergence using the SIR method

    Cox process functional learning

    No full text
    International audienceThis article addresses the problem of supervised classification of Cox process trajectories, whose random intensity is driven by some exogenous random covariable. The classification task is achieved through a regularized convex empirical risk minimization procedure, and a nonasymptotic oracle inequality is derived. We show that the algorithm provides a Bayes-risk consistent classifier. Furthermore, it is proved that the classifier converges at a rate which adapts to the unknown regularity of the intensity process. Our results are obtained by taking advantage of martingale and stochastic calculus arguments, which are natural in this context and fully exploit the functional nature of the problem

    Clustering by Estimation of Density Level Sets at a Fixed Probability

    No full text
    In density-based clustering methods, the clusters are defined as the connected components of the upper level sets of the underlying density ff. In this setting, the practitioner fixes a probability pp, and associates with it a threshold t(p)t^{(p)} such that the level set {f≥t(p)}\{f\geq t^{(p)}\} has a probability pp with respect to the distribution induced by ff. This paper is devoted to the estimation of the threshold t(p)t^{(p)}, of the level set {f≥t(p)}\{f\geq t^{(p)}\}, as well as of the number k(t(p))k(t^{(p)}) of connected components of this level set. Given a nonparametric density estimate f^n\hat f_n of ff based on an i.i.d. nn-sample drawn from ff, we first propose a computationally simple estimate tn(p)t_n^{(p)} of t(p)t^{(p)}, and we establish a concentration inequality for this estimate. Next, we consider the plug-in level set estimate {f^n≥tn(p)}\{\hat f_n\geq t_n^{(p)}\}, and we establish the exact convergence rate of the Lebesgue measure of the symmetric difference between {f≥t(p)}\{f \geq t^{(p)}\} and {f^n≥tn(p)}\{\hat f_n\geq t_n^{(p)}\}. Finally, we propose a computationally simple graph-based estimate of k(t(p))k(t^{(p)}), which is shown to be consistent. Thus, the methodology yields a complete procedure for analyzing the grouping structure of the data, as pp varies over (0;1)(0;1)

    Sur l'estimation du support d'une densité

    No full text
    International audienceEtant donnée une densité de probabilité multivariée inconnue ff à support compact et un nn-échantillon i.i.d. issu de ff, nous étudions l'estimateur du support de ff défini par l'union des boules de rayon rnr_n centrées sur les observations. Afin de mesurer la qualité de l'estimation, nous utilisons un critère général fondé sur le volume de la différence symétrique. Sous quelques hypothèses peu restrictives, et en utilisant des outils de la géométrie riemannienne, nous établissons les vitesses de convergence exactes de l'estimateur du support tout en examinant les conséquences statistiques de ces résultats

    On the kernel rule for function classification

    No full text
    International audienceLet X be a random variable taking values in a function space F, and let Y be a discrete random label with values 0 and 1. We investigate asymptotic properties of the moving window classification rule based on independent copies of the pair (X, Y ). Contrary to the finite dimensional case, it is shown that the moving window classifier is not universally consistent in the sense that its probability of error may not converge to the Bayes risk for some distributions of (X, Y ). Sufficient conditions both on the space F and the distribution of X are then given to ensure consistency
    • …
    corecore